\subsection{Simple example}
Consider a function \(\mathcal F\colon \mathbb R^2 \to \mathbb R\) defined by
\[
	\mathcal F(\vb x) = 2x_1^2 - 4x_1x_2 + 5x_2^2
\]
This can be simplified by writing
\[
	\mathcal F(\vb x) = x_1'^2 + 6x_2'^2
\]
where
\[
	x_1' = \frac{1}{\sqrt 5}(2x_1 + x_2);\quad x_2' = \frac{1}{\sqrt 5}(-x_1 + 2x_2)
\]
This can be found by writing \(\mathcal F(\vb x) = \vb x^\transpose A\vb x\) where
\[
	A = \begin{pmatrix}
		2 & -2 \\ -2 & 5
	\end{pmatrix}
\]
by inspection from the original equation, and then diagonalising \(A\).
We find the eigenvalues to be \(\lambda = 1, 6\), with eigenvectors
\[
	\frac{1}{\sqrt 5} \begin{pmatrix}
		2 \\ 1
	\end{pmatrix};\quad \frac{1}{\sqrt 5}\begin{pmatrix}
		-1 \\ 2
	\end{pmatrix}
\]

\subsection{Diagonalising quadratic forms}
In general, a quadratic form is a function \(\mathcal F\colon \mathbb R^n \to \mathbb R\) given by
\[
	\mathcal F(\vb x) = \vb x^\transpose A \vb x \implies \mathcal F(\vb x)_{ij} = x_i A_{ij} x_j
\]
where \(A\) is a real symmetric \(n \times n\) matrix.
Any antisymmetric part of \(A\) would not contribute to the result, so there is no loss of generality under this restriction.
From the section above, we know we can write \(P^\transpose A P = D\) where \(D\) is a diagonal matrix containing the eigenvalues, and \(P\) is constructed from the eigenvectors, with orthonormal columns \(\vb u_i\).
Setting \(\vb x' = P^\transpose \vb x\), or equivalently \(\vb x = P \vb x'\), we have
\begin{align*}
	\mathcal F(\vb x) & = \vb x^\transpose A \vb x                                              \\
	                  & = (P \vb x')^\transpose A (P \vb x')                                    \\
	                  & = (\vb x')^\transpose P^\transpose A P \vb x'                           \\
	                  & = (\vb x')^\transpose D \vb x'                                          \\
	                  & = \sum_i \lambda_i x_i'^2 = \lambda_1 x_1'^2 + \lambda_2 x_2'^2 + \dots
\end{align*}
We say that \(\mathcal F\) has been diagonalised.
Now, note that
\begin{align*}
	\vb x' & = x_1'\vb e_1 + \dots + x_n'\vb e_n \\
	\vb x  & = x_1\vb e_1 + \dots + x_n\vb e_n   \\
	       & = x_1'\vb u_1 + \dots + x_n'\vb u_n
\end{align*}
where the \(\vb e_i\) are the standard basis vectors, since
\[
	\vb x_i' = \vb u_i \cdot \vb x \iff \vb x' = P^\transpose \vb x
\]
Hence the \(\vb x_i'\) can be regarded as coordinates with respect to a new set of axes defined by the orthonormal eigenvector basis, known as the principal axes of the quadratic form.
They are related to the standard axes (given by basis vectors \(\vb e_i\)) by the orthogonal transformation \(P\).

\begin{example}[two dimensions]
Consider \(\mathcal F(\vb x) = \vb x^\transpose A \vb x\) with
\[
	A = \begin{pmatrix}
		\alpha & \beta \\ \beta & \alpha
	\end{pmatrix}
\]
The eigenvalues are \(\lambda = \alpha + \beta, \alpha - \beta\) and
\[
	\vb u_1 = \frac{1}{\sqrt 2}\begin{pmatrix}
		1 \\ 1
	\end{pmatrix};\quad \vb u_2 = \frac{1}{\sqrt 2}\begin{pmatrix}
		-1 \\ 1
	\end{pmatrix}
\]
So in terms of the standard basis vectors,
\[
	\mathcal F(\vb x) = \alpha x_1^2 + 2\beta x_1x_2 + \alpha x_2^2
\]
And in terms of our new basis vectors,
\[
	\mathcal F(\vb x) = (\alpha + \beta) x_1'^2 + (\alpha - \beta) x_2'^2
\]
where
\begin{align*}
	\vb x_1' & = \vb u_1 \cdot \vb x = \frac{1}{\sqrt 2}(x_1 + x_2)  \\
	\vb x_2' & = \vb u_2 \cdot \vb x = \frac{1}{\sqrt 2}(-x_1 + x_2) \\
\end{align*}
Taking for example \(\alpha = \frac{3}{2}, \beta = \frac{-1}{2}\), we have \(\lambda_1 = 1, \lambda_2 = 2\).
If we choose \(\mathcal F = 1\), this represents an ellipse in our new coordinate system:
\[
	x_1'^2 + 2x_2'^2 = 1
\]
If instead we chose \(\alpha = \frac{-1}{2}, \beta = \frac{3}{2}\).
We now have \(\lambda_1 = 1, \lambda_2 = -2\).
The locus at \(\mathcal F = 1\) gives a hyperbola:
\[
	x_1'^2 - 2x_2'^2 = 1
\]
\end{example}

\begin{example}[three dimensions]
In \(\mathbb R^3\), note that if \(\lambda_1, \lambda_2, \lambda_3\) are all strictly positive, then \(\mathcal F = 1\) gives an ellipsoid.
This is analogous to the \(\mathbb R^2\) case above.

Let us consider an example.
Earlier, we found that the eigenvalues of the matrix \(A\) where
\[
	A = \begin{pmatrix}
		0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0
	\end{pmatrix}
\]
are \(\lambda_1 = \lambda_2 = -1, \lambda_3 = 2\), where
\[
	\vb u_1 = \frac{1}{\sqrt 2} \begin{pmatrix}
		1 \\ -1 \\ 0
	\end{pmatrix};\quad \vb u_2 = \frac{1}{\sqrt 6}\begin{pmatrix}
		1 \\ 1 \\ -2
	\end{pmatrix};\quad \vb u_3 = \frac{1}{\sqrt 3}\begin{pmatrix}
		1 \\ 1 \\ 1
	\end{pmatrix}
\]
Then
\begin{align*}
	\mathcal F(\vb x) & = 2x_1x_2 + 2x_2x_3 + 2x_3x_1 \\
	                  & = -x_1'^2 -x_2'^2 + 2x_3'^2
\end{align*}
Now, \(\mathcal F = 1\) corresponds to
\[
	2x_3'^2 = 1 + x_1'^2 + x_2'^2
\]
So we can more clearly see that this is a hyperboloid of two sheets in \(\mathbb R^3\) with rotational symmetry between the \(x_1\) and \(x_2\) axes.
Further, \(\mathcal F = -1\) corresponds to
\[
	1 + 2x_3'^2 = x_1'^2 + x_2'^2
\]
Here, this is a hyperboloid of one sheet since for any fixed \(x_3\) coordinate, it defines a circle in the \(x_1\) and \(x_2\) axes.
\end{example}

\subsection{Hessian matrix as a quadratic form}
Consider a smooth function \(f\colon \mathbb R^n \to \mathbb R\) with a stationary point at \(\vb x = \vb a\), i.e.\ \(\frac{\partial f}{\partial x_i} = 0\) at \(\vb x = \vb a\).
By Taylor's theorem,
\[
	f(\vb a + \vb h) + f(\vb a) + \mathcal F(\vb h) + O(\abs{\vb h}^3)
\]
where \(\mathcal F\) is a quadratic form with
\[
	A_{ij} = \frac{1}{2}\frac{\partial^2 f}{\partial x_i\partial x_j}
\]
all evaluated at \(\vb x = \vb a\).
Note that this \(A\) is half of the Hessian matrix, and that the linear term vanishes since we are at a stationary point.
Rewriting this \(\vb h\) in terms of the eigenvectors of \(A\) (the principal axes), we have
\[
	\mathcal F = \lambda_1 h_1'^2 + \lambda_2 h_2'^2 + \dots + \lambda_n h_n'^2
\]
So clearly if \(\lambda_i > 0\) for all \(i\), then \(f\) has a minimum at \(\vb x = \vb a\).
If \(\lambda_i < 0\) for all \(I\), then \(f\) has a maximum at \(\vb x = \vb a\).
Otherwise, it has a saddle point.
Note that it is often sufficient to consider the trace and determinant of \(A\), since \(\trace A = \lambda_1 + \lambda_2\) and \(\det A = \lambda_1\lambda_2\).
